94

Binary Neural Architecture Search

The value of selecting an operation ˜rk is the expected reward  rewardi we receive when

we take an operation from the possible set of operations. If nk approaches infinity, ˜rk

approaches the actual value of the operation rk. However, the number of operations nk

cannot be infinite. Therefore, we should approximate the actual value as closely as possible

through the variance.

Definition 2. There exists a difference between the estimated probability ˜rk and the actual

probability rk, and we can estimate the variance concerning the value

˜δk =

2 ln N

n

,

(4.3)

where N is the total number of trails.

Proof. Suppose X[0, 1] represents the theoretical value of each independently distributed

operation. n is the number of times the arm has been played up to trial, and pi is the actual

value of the operation in the ith trail. Furthermore, we define p =



i pi

n

and q = 1p.

Since the variance boundary of independent operations can represent the global variance

boundary (see the Appendix), based on Markov’s inequality, we can arrive at below :

P[X > p + δ] = P[



i

(Xipi) > δ]

= P[eλ 

i(Xipi) > eλδ]

E[eλ 

i(Xipi)]

eλδ

.

(4.4)

Since we can get 1 + xex 1 + x + x2 when 0 ≤|x| ≤1), E[eλ 

i(Xipi)] in Eq. 4.4

can be further approximated as follows:

E[eλ 

i(Xipi)] =

!

i

E[eλ(Xipi)]

!

i

E[1 + λ(Xipi) + λ2(Xipi)2]

=

!

i

(1 + λ2v2

i )

eλ2v2,

(4.5)

where v denotes the variance of X. Combining Eq. 4.4 and Eq. 4.5 gives P[X > p +

δ]eλ2v2

eλδ . Since λ is a positive constant, it can be obtained by the transformation of the

values P[X > p + δ]e22. According to the symmetry of the distribution, we have

P[X < pδ]e22. Finally, we get the following inequality:

P[|Xp| ≤δ]12e22.

(4.6)

We need to decrease δ as operating recommendations increase. Therefore, we choose



2 ln N

n

as ˜δ. That is to say, p



2 ln N

n

Xp +



2 ln N

n

is implemented at least with

probability 1

2

N 4 . The variance value will gradually decrease as the trail progresses, and ˜rk

will gradually approach rk. Equation 4.7 shows that we can achieve a probability of 0.992

when the number of the trail gets 4.

12

N 4 =

0.857

N=2

0.975

N=3

0.992

N=4.

(4.7)